{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Date and Time features with Feature-engine\n",
    "\n",
    "[Feature Engineering for Time Series Forecasting](https://www.trainindata.com/p/feature-engineering-for-forecasting)\n",
    "\n",
    "We can automate the extraction of date and time features with Feature-engine, using the [DatetimeFeatures](https://feature-engine.readthedocs.io/en/latest/api_doc/datetime/DatetimeFeatures.html) transformer.\n",
    "\n",
    "## The dataset\n",
    "\n",
    "We will use the Online Retail II Data Set available in the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/machine-learning-databases/00502/).\n",
    "\n",
    "Download the xlsx file from the link above and save it in the **Datasets** folder within this repo.\n",
    "\n",
    "**Citation**:\n",
    "\n",
    "Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.\n",
    "\n",
    "## In this demo\n",
    "\n",
    "We will extract different time-related features from the datetime variable: **InvoiceDate**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from feature_engine.datetime import DatetimeFeatures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1067371, 8)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Invoice</th>\n",
       "      <th>StockCode</th>\n",
       "      <th>Description</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>InvoiceDate</th>\n",
       "      <th>Price</th>\n",
       "      <th>Customer ID</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>489434</td>\n",
       "      <td>85048</td>\n",
       "      <td>15CM CHRISTMAS GLASS BALL 20 LIGHTS</td>\n",
       "      <td>12</td>\n",
       "      <td>2009-12-01 07:45:00</td>\n",
       "      <td>6.95</td>\n",
       "      <td>13085.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>489434</td>\n",
       "      <td>79323P</td>\n",
       "      <td>PINK CHERRY LIGHTS</td>\n",
       "      <td>12</td>\n",
       "      <td>2009-12-01 07:45:00</td>\n",
       "      <td>6.75</td>\n",
       "      <td>13085.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>489434</td>\n",
       "      <td>79323W</td>\n",
       "      <td>WHITE CHERRY LIGHTS</td>\n",
       "      <td>12</td>\n",
       "      <td>2009-12-01 07:45:00</td>\n",
       "      <td>6.75</td>\n",
       "      <td>13085.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>489434</td>\n",
       "      <td>22041</td>\n",
       "      <td>RECORD FRAME 7\" SINGLE SIZE</td>\n",
       "      <td>48</td>\n",
       "      <td>2009-12-01 07:45:00</td>\n",
       "      <td>2.10</td>\n",
       "      <td>13085.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>489434</td>\n",
       "      <td>21232</td>\n",
       "      <td>STRAWBERRY CERAMIC TRINKET BOX</td>\n",
       "      <td>24</td>\n",
       "      <td>2009-12-01 07:45:00</td>\n",
       "      <td>1.25</td>\n",
       "      <td>13085.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Invoice StockCode                          Description  Quantity  \\\n",
       "0  489434     85048  15CM CHRISTMAS GLASS BALL 20 LIGHTS        12   \n",
       "1  489434    79323P                   PINK CHERRY LIGHTS        12   \n",
       "2  489434    79323W                  WHITE CHERRY LIGHTS        12   \n",
       "3  489434     22041         RECORD FRAME 7\" SINGLE SIZE         48   \n",
       "4  489434     21232       STRAWBERRY CERAMIC TRINKET BOX        24   \n",
       "\n",
       "          InvoiceDate  Price  Customer ID         Country  \n",
       "0 2009-12-01 07:45:00   6.95      13085.0  United Kingdom  \n",
       "1 2009-12-01 07:45:00   6.75      13085.0  United Kingdom  \n",
       "2 2009-12-01 07:45:00   6.75      13085.0  United Kingdom  \n",
       "3 2009-12-01 07:45:00   2.10      13085.0  United Kingdom  \n",
       "4 2009-12-01 07:45:00   1.25      13085.0  United Kingdom  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# File path:\n",
    "file = \"../Datasets/online_retail_II.xlsx\"\n",
    "\n",
    "# The data is provided as two sheets in a single Excel file.\n",
    "# Each sheet contains a different time period.\n",
    "# Load both and join them into a single dataframe\n",
    "# as shown below:\n",
    "\n",
    "df_1 = pd.read_excel(file, sheet_name=\"Year 2009-2010\")\n",
    "df_2 = pd.read_excel(file, sheet_name=\"Year 2010-2011\")\n",
    "\n",
    "data = pd.concat([df_1, df_2])\n",
    "\n",
    "print(data.shape)\n",
    "\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this dataset, we have the datetime variable in a column called InvoiceDate. We could also have it in the dataframe index. The procedure for extracting the date and time features is identical. That is, we would use the methods from pandas dt as shown below.\n",
    "\n",
    "The dataset contains sales information for different customers in different countries. Customers may have made one or multiple purchases from the business that provided the data.\n",
    "\n",
    "## Variable format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('<M8[ns]')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Let's determine the type of data in the datetime variable.\n",
    "\n",
    "data[\"InvoiceDate\"].dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extract all possible features\n",
    "\n",
    "DatetimeFeatures can extract all possible features out-of-the-box."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "dtfs = DatetimeFeatures(\n",
    "    variables=None,  # it identifies the datetime variable automatically.\n",
    "    features_to_extract=\"all\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>InvoiceDate_month</th>\n",
       "      <th>InvoiceDate_quarter</th>\n",
       "      <th>InvoiceDate_semester</th>\n",
       "      <th>InvoiceDate_year</th>\n",
       "      <th>InvoiceDate_week</th>\n",
       "      <th>InvoiceDate_day_of_week</th>\n",
       "      <th>InvoiceDate_day_of_month</th>\n",
       "      <th>InvoiceDate_day_of_year</th>\n",
       "      <th>InvoiceDate_weekend</th>\n",
       "      <th>InvoiceDate_month_start</th>\n",
       "      <th>InvoiceDate_month_end</th>\n",
       "      <th>InvoiceDate_quarter_start</th>\n",
       "      <th>InvoiceDate_quarter_end</th>\n",
       "      <th>InvoiceDate_year_start</th>\n",
       "      <th>InvoiceDate_year_end</th>\n",
       "      <th>InvoiceDate_leap_year</th>\n",
       "      <th>InvoiceDate_days_in_month</th>\n",
       "      <th>InvoiceDate_hour</th>\n",
       "      <th>InvoiceDate_minute</th>\n",
       "      <th>InvoiceDate_second</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2009</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>335</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>31</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2009</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>335</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>31</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2009</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>335</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>31</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2009</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>335</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>31</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2009</td>\n",
       "      <td>49</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>335</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>31</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   InvoiceDate_month  InvoiceDate_quarter  InvoiceDate_semester  \\\n",
       "0                 12                    4                     2   \n",
       "1                 12                    4                     2   \n",
       "2                 12                    4                     2   \n",
       "3                 12                    4                     2   \n",
       "4                 12                    4                     2   \n",
       "\n",
       "   InvoiceDate_year  InvoiceDate_week  InvoiceDate_day_of_week  \\\n",
       "0              2009                49                        1   \n",
       "1              2009                49                        1   \n",
       "2              2009                49                        1   \n",
       "3              2009                49                        1   \n",
       "4              2009                49                        1   \n",
       "\n",
       "   InvoiceDate_day_of_month  InvoiceDate_day_of_year  InvoiceDate_weekend  \\\n",
       "0                         1                      335                    0   \n",
       "1                         1                      335                    0   \n",
       "2                         1                      335                    0   \n",
       "3                         1                      335                    0   \n",
       "4                         1                      335                    0   \n",
       "\n",
       "   InvoiceDate_month_start  InvoiceDate_month_end  InvoiceDate_quarter_start  \\\n",
       "0                        1                      0                          0   \n",
       "1                        1                      0                          0   \n",
       "2                        1                      0                          0   \n",
       "3                        1                      0                          0   \n",
       "4                        1                      0                          0   \n",
       "\n",
       "   InvoiceDate_quarter_end  InvoiceDate_year_start  InvoiceDate_year_end  \\\n",
       "0                        0                       0                     0   \n",
       "1                        0                       0                     0   \n",
       "2                        0                       0                     0   \n",
       "3                        0                       0                     0   \n",
       "4                        0                       0                     0   \n",
       "\n",
       "   InvoiceDate_leap_year  InvoiceDate_days_in_month  InvoiceDate_hour  \\\n",
       "0                      0                         31                 7   \n",
       "1                      0                         31                 7   \n",
       "2                      0                         31                 7   \n",
       "3                      0                         31                 7   \n",
       "4                      0                         31                 7   \n",
       "\n",
       "   InvoiceDate_minute  InvoiceDate_second  \n",
       "0                  45                   0  \n",
       "1                  45                   0  \n",
       "2                  45                   0  \n",
       "3                  45                   0  \n",
       "4                  45                   0  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Extract features.\n",
    "dft = dtfs.fit_transform(data)\n",
    "\n",
    "# Capture the names of the features we just created.\n",
    "# (Feature-engine tags them with the original variable name\n",
    "# plus the feature it extracted).\n",
    "vars_ = [v for v in dft.columns if \"InvoiceDate\" in v]\n",
    "\n",
    "# Show\n",
    "dft[vars_].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['InvoiceDate']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The datetime variable, which was automatically\n",
    "# identified, is stored in an attribute.\n",
    "\n",
    "dtfs.variables_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extract most common features\n",
    "\n",
    "DatetimeFeatures can extract the most **common** features out-of-the-box."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "dtfs = DatetimeFeatures(\n",
    "    variables=None,  # it identifies the datetime variable automatically\n",
    "    features_to_extract=None,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>InvoiceDate_month</th>\n",
       "      <th>InvoiceDate_year</th>\n",
       "      <th>InvoiceDate_day_of_week</th>\n",
       "      <th>InvoiceDate_day_of_month</th>\n",
       "      <th>InvoiceDate_hour</th>\n",
       "      <th>InvoiceDate_minute</th>\n",
       "      <th>InvoiceDate_second</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>12</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>12</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   InvoiceDate_month  InvoiceDate_year  InvoiceDate_day_of_week  \\\n",
       "0                 12              2009                        1   \n",
       "1                 12              2009                        1   \n",
       "2                 12              2009                        1   \n",
       "3                 12              2009                        1   \n",
       "4                 12              2009                        1   \n",
       "\n",
       "   InvoiceDate_day_of_month  InvoiceDate_hour  InvoiceDate_minute  \\\n",
       "0                         1                 7                  45   \n",
       "1                         1                 7                  45   \n",
       "2                         1                 7                  45   \n",
       "3                         1                 7                  45   \n",
       "4                         1                 7                  45   \n",
       "\n",
       "   InvoiceDate_second  \n",
       "0                   0  \n",
       "1                   0  \n",
       "2                   0  \n",
       "3                   0  \n",
       "4                   0  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Extract features\n",
    "dft = dtfs.fit_transform(data)\n",
    "\n",
    "# Capture the names of the features we just created.\n",
    "# (Feature-engine tags them with the original variable name\n",
    "# plus the feature it extracted).\n",
    "vars_ = [v for v in dft.columns if \"InvoiceDate\" in v]\n",
    "\n",
    "# Show\n",
    "dft[vars_].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extract user defined features\n",
    "\n",
    "We can also extract a subset of features of choice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "dtfs = DatetimeFeatures(\n",
    "    variables=None,  # it identifies the datetime variable automatically\n",
    "    features_to_extract=[\"week\", \"year\", \"day_of_month\", \"day_of_week\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>InvoiceDate_week</th>\n",
       "      <th>InvoiceDate_year</th>\n",
       "      <th>InvoiceDate_day_of_month</th>\n",
       "      <th>InvoiceDate_day_of_week</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>49</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>49</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>49</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>49</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>49</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   InvoiceDate_week  InvoiceDate_year  InvoiceDate_day_of_month  \\\n",
       "0                49              2009                         1   \n",
       "1                49              2009                         1   \n",
       "2                49              2009                         1   \n",
       "3                49              2009                         1   \n",
       "4                49              2009                         1   \n",
       "\n",
       "   InvoiceDate_day_of_week  \n",
       "0                        1  \n",
       "1                        1  \n",
       "2                        1  \n",
       "3                        1  \n",
       "4                        1  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Extract features\n",
    "dft = dtfs.fit_transform(data)\n",
    "\n",
    "# Capture the names of the features we just created.\n",
    "# (Feature-engine tags them with the original variable name\n",
    "# plus the feature it extracted).\n",
    "vars_ = [v for v in dft.columns if \"InvoiceDate\" in v]\n",
    "\n",
    "# Show\n",
    "dft[vars_].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Working with timezones\n",
    "\n",
    "DatetimeFeatures also takes care of different time zones out-of-the-box."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2014-08-01 09:00:00+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2014-08-01 10:00:00+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2014-08-01 11:00:00+02:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2014-08-01 09:00:00-05:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2014-08-01 10:00:00-05:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2014-08-01 11:00:00-05:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        time\n",
       "0  2014-08-01 09:00:00+02:00\n",
       "1  2014-08-01 10:00:00+02:00\n",
       "2  2014-08-01 11:00:00+02:00\n",
       "0  2014-08-01 09:00:00-05:00\n",
       "1  2014-08-01 10:00:00-05:00\n",
       "2  2014-08-01 11:00:00-05:00"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# First, let's create a toy dataframe with some\n",
    "# timestamps in different time zones.\n",
    "\n",
    "df = pd.DataFrame()\n",
    "\n",
    "df[\"time\"] = pd.concat(\n",
    "    [\n",
    "        pd.Series(\n",
    "            pd.date_range(\n",
    "                start=\"2014-08-01 09:00\", freq=\"H\", periods=3, tz=\"Europe/Berlin\"\n",
    "            )\n",
    "        ),\n",
    "        pd.Series(\n",
    "            pd.date_range(\n",
    "                start=\"2014-08-01 09:00\", freq=\"H\", periods=3, tz=\"US/Central\"\n",
    "            )\n",
    "        ),\n",
    "    ],\n",
    "    axis=0,\n",
    ")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see the different timezones indicated by the +2 and -5, with respect to the central meridian."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "dfts = DatetimeFeatures(\n",
    "    features_to_extract=[\"day_of_week\", \"hour\", \"minute\"],\n",
    "    drop_original=False,\n",
    "    utc=True,  # to handle timezones\n",
    ")\n",
    "\n",
    "# DatetimeFeatures will take all timestamps to utc\n",
    "# before deriving the features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>time</th>\n",
       "      <th>time_day_of_week</th>\n",
       "      <th>time_hour</th>\n",
       "      <th>time_minute</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2014-08-01 09:00:00+02:00</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2014-08-01 10:00:00+02:00</td>\n",
       "      <td>4</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2014-08-01 11:00:00+02:00</td>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2014-08-01 09:00:00-05:00</td>\n",
       "      <td>4</td>\n",
       "      <td>14</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2014-08-01 10:00:00-05:00</td>\n",
       "      <td>4</td>\n",
       "      <td>15</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        time  time_day_of_week  time_hour  time_minute\n",
       "0  2014-08-01 09:00:00+02:00                 4          7            0\n",
       "1  2014-08-01 10:00:00+02:00                 4          8            0\n",
       "2  2014-08-01 11:00:00+02:00                 4          9            0\n",
       "0  2014-08-01 09:00:00-05:00                 4         14            0\n",
       "1  2014-08-01 10:00:00-05:00                 4         15            0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dft = dfts.fit_transform(df)\n",
    "\n",
    "dft.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More details about Feature-engine's datetime transformer in the [user guide](https://feature-engine.readthedocs.io/en/latest/user_guide/datetime/DatetimeFeatures.html#datetime-features)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "fets",
   "language": "python",
   "name": "fets"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": "block",
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
